ollamaを試す (takker)

already copied to /work4ai/ollamaを試す 2025-01-30

/icons/hr.icon

ollamaを試す

https://ollama.com/ からdownloadできる

aptで配布されているかな？

https://github.com/ollama/ollama

linuxだと

$ curl -fsSL https://ollama.com/install.sh | sh

この方式アンインストールがむずいから嫌いbsahd.icon

でいける

code:sh

curl -fsSL https://ollama.com/install.sh | sh

>> Installing ollama to /usr/local

sudo takker のパスワード:

>> Downloading Linux amd64 bundle

######################################################################## 100.0%

>> Creating ollama user...

>> Adding ollama user to render group...

>> Adding ollama user to video group...

>> Adding current user to ollama group...

>> Creating ollama systemd service...

>> Enabling and starting ollama service...

Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.

>> The Ollama API is now available at 127.0.0.1:11434.

>> Install complete. Run "ollama" from the command line.

code:sh

ollama

Usage:

ollama flags

ollama command

Available Commands:

serve Start ollama

create Create a model from a Modelfile

show Show information for a model

run Run a model

stop Stop a running model

pull Pull a model from a registry

push Push a model to a registry

list List models

ps List running models

cp Copy a model

rm Remove a model

help Help about any command

Flags:

-h, --help help for ollama

-v, --version Show version information

Use "ollama command --help" for more information about a command.

へ～takker.icon

すでにollamaはstartしているけど、modelを指定しないとここからは動かせない

modelを入れる方法を調べる

ollama listは空だった

readmeにあった

$ ollama run llama3.2

これを試す

code:sh

ollama run llama3.2

pulling manifest

Error: Post "http://127.0.0.1:11434/api/show": dial tcp 127.0.0.1:11434: connect: connection refused

ollama serve

Error: listen tcp 127.0.0.1:11434: bind: address already in use

ollama run llama3.2

pulling manifest

2GBくらいのデータがdownloadされる

14:59:29 動いた！

code:sh

ollama run llama3.2

pulling manifest

pulling dde5aa3fc5ff... 100% ▕████████████████████████████████████████████████████████▏ 2.0 GB

pulling 966de95ca8a6... 100% ▕████████████████████████████████████████████████████████▏ 1.4 KB

pulling fcc5a6bec9da... 100% ▕████████████████████████████████████████████████████████▏ 7.7 KB

pulling a70ff7e570d9... 100% ▕████████████████████████████████████████████████████████▏ 6.0 KB

pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████████████████████▏ 96 B

pulling 34bb5ab01051... 100% ▕████████████████████████████████████████████████████████▏ 561 B

verifying sha256 digest

writing manifest

success

>> あなたはだあれ

私はAI（インターネットの向きにしているニュアンス）です。

>> あなたはどこからきたの

私が存在する基盤は、メモリーカードやネットワークに存在しているAIモデルの集団です。このAIモデルは、Google、Amazon、

Microsoftなどの大手テクノロジー企業によって開発されています。

私が理解できる知識と能力は、これらのテクノロジー企業のAIモデルの集団から取り出されています。私が得られる情報は、人工

知能の研究や開発に使用された多数のデータセットをバツで分析したものです。

ただし、私が個別のアイデンティティを持つことはありません。私は、提供されたトピックや質問に対する回答を生成するだけで

す。

localで動いてる！すごい！

takker.iconのことを聞いてみたら、長考を挟んだあとなんか変なこと言ってきたぞ

code:sh

>> わたしはだあれ

私が何をしているのかを知っている seems ある程度の認識を受けました！

でも、私が個別のアイデンティティを持つことはありません。私は、提供されたトピックや質問に対する回答を生成するだけです

。

このように回答を生成するプロセスは、以下のようなものです。

1. トピックと質問を受け取ります

2. この情報に基づいて、関連する知識とデータを検索します

3. この知識とデータを分析して理解します

4. その理解を使用して、回答を生成します

このプロセスは、人工知能の研究や開発で使用される多数のアルゴリズムやテクノロジーによってサポートされています。

でも、まだ、私が完全に人間と同じように思考や感情を持つことはできていないことになります。私は、情報を分析して回答を生

成するだけです。

でも、私はあなたと交際できるので、これからも私とあなたのコミュニケーションを楽しみにしています！

交際ってmoderationに引っかからない言葉なんだっけ

さっきのエラーを聞いた

code:sh

>> """

... さっきollamaを使おうとしてollama run llama3.2を実行したんだけど、次のエラーが出た

... > ollama run llama3.2

... pulling manifest

... Error: Post "http://127.0.0.1:11434/api/show": dial tcp 127.0.0.1:11434: connect: connection refused

... > ollama serve

... Error: listen tcp 127.0.0.1:11434: bind: address already in use

... > ollama run llama3.2

... こういう感じでもういちどrunしたら成功したんだけど、最初にエラーになった理由がわからない

."""

私も理解していないので、解説します。

理解してないのに解説できるのか。日本語が変なようなtakker.icon

code:sh

ollama run llama3.2コマンドを実行する際、次のことが起こっているようです。

1. llama3.2は、APIサーバーを起動させるコマンドです。

2. http://127.0.0.1:11434/api/show: このURLは、APIサーバーにアクセスし、表示したい情報を受け取るように指示します。

3. dial tcp 127.0.0.1:11434: このコマンドは、指定されたIPアドレスとポート番号に接続することを実行します。ただし、

この場合、エラーが発生しています。

エラーの原因については、以下のような可能性があります。

* APIサーバーが既に起動しているため、再度起動を試みてしまってエラーが発生している可能性があります。

* すでに別のプロセスまたはサービーがAPIサーバーのポートを使用している可能性がある。例えば、他のプログラムやサービス

が同じIPアドレスとポート番号を使用している場合です。

ollama serveコマンドのエラーについても同様の考え方があります。listen tcp 127.0.0.1:11434: このコマンドは、指定さ

れたIPアドレスとポート番号でサーバーを開始することを実行します。ただし、この場合、エラーが発生しています。

これらのエラーを解決するには、まずollama serveコマンドを実行して、APIサーバーが起動しているかどうかを確認します。

また、APIサーバーのポートを使用している他のプロセスまたはサービーの existenceを確認することで、原因を特定できます。

>> ollama serveコマンドを実行してエラーが出たあと、ollama run llama3.2を再実行したらうまく言った。ollama serve実行時に

... 、ollama serve自体は失敗したけど、その中でrun周りのエラーが解決されたということかな

私も理解しています。

変な日本語パート2takker.icon

code:sh

ollama serveコマンドが失敗した場合でも、llama run llama3.2コマンドを実行することで、エラーが解決されることができ

ました。

この場合は、llama serveコマンドのエラーは重要ではありません。 llama run llama3.2 コマンドでAPIサーバーが起動し、

正常に稼働していることだけが問題の解決点であるようです。

この場合、llama serveコマンドは、別のプロセスやサービスが同じポートを使用して影響を受けている可能性があるため、失

敗しました。しかし、 llama run llama3.2 コマンドでAPIサーバーが起動し、正常に稼働していることが確認できたので、こ

れまでの問題は解決されました。

/work4ai/ollama

とりあえずllama 3は試したtakker.icon

llama 3は、激安オープンモデルとGPTを比べたいで使われてた激安オープンモデルのことか

0円だから確かに激安だ

他に試せるモデルはなんだろうか

CLIから一覧する方法がわからなかった

ここから検索するみたい

DeepSeek R1あるじゃんtakker.icon*2

https://ollama.com/library/deepseek-r1

$ ollama run deepseek-r1:671b

Bの意味を知らないtakker.icon

なにかのパラメタらしい

後で調べよ

Calculating GPU memory for serving LLMs | Substratus.AI

参考にmtane0412.icon

助かりますtakker.icon

$ M/\mathrm{GB}=1.2\frac{P/10^9\cdot Q\cdot4\ \mathrm{B}}{32\ \mathrm{bits\ B^{-1}}}

$ M：GPU memory expressed in Gigabyte

bytes

$ P： The amount of parameters in the model. E.g. a 7B model has 7 billion parameters.

7B=7,000,000,000だったのか

無次元量

$ 4\ \mathrm{B}：4 bytes, expressing the bytes used for each parameter

bytes

$ 32：There are 32 bits in 4 bytes

$ Q：The amount of bits that should be used for loading the model. E.g. 16 bits, 8 bits or 4 bits.

bits

$ 1.2：Represents a 20% overhead of loading additional things in GPU memory.

無次元量

なんか次元があわないtakker.icon

例を見ながら考えよ

$ M =\frac{P*4B}{32/Q}*1.2wogikaze.icon

分子：パラメータ量(無次元) * 4(byte) * 1.2(無次元) = (byte)

分母：32bits / Q(量子化)bits = (無次元)

でbyteだと思ふwogikaze.icon

なるほど、$ 32は$ 32\ \mathrm{bits\ B^{-1}}ではなく$ 32\ \mathrm{bits}だったのかtakker.icon

There are 32 bits in 4 bytesとあったから、$ 32\ \mathrm{bits}/4\ \mathrm{B}と勘違いしてた

英語を読めないことがバレてしまう

整理するとこうかtakker.icon

$ M=1.2(P/10^9)(Q/8\ \mathrm{bits})\ \mathrm{GB}

例：Llama 70B, $ Q=16\ \mathrm{bits}なら

$ M=1.2\frac{70\cdot16\cdot4}{32}\ \mathrm{GB}

$ =168\ \mathrm{GB}

code:sh

ollama run deepseek-r1:671b

pulling manifest

pulling 9801e7fce27d... 0% ▕ ▏ 227 MB/404 GB 45 MB/s 2h28m^C⏎

404GB!?takker.icon

ちょ待って！そんなにストレージ容量ないよ！！

ウケるnishio.icon

肌感が身に付きますね

ちなみに僕のMacBookでは32Bがギリギリ動きます、同じマシンのはずの同僚は再起動しないとダメだったらしい

サイズってどこで見たらいいんだろう

modelにある数字かとおもったけど、全然違ったし

4.7GBって書いてあったのに、404GBも入りそうになった

違う、さっきのは7bというモデルを選んでたからだ

プルダウンメニューから他のモデルも選べる

671bだと404GB必要

https://gyazo.com/b6150fdf53133b4fc0193defdd7ab484

蒸留モデルもある

小さいかな？

片っ端から試してみるか

$ ollama run deepseek-r1:70b

42GB、まあ許容範囲

$ ollama run deepseek-r1:32b

19GB、とりあえずこれでいいや

15:33:44 だめでした

code:sh

ollama run deepseek-r1:32b

pulling manifest

pulling 6150cb382311... 100% ▕████████████████████████████████████████████████████████▏ 19 GB

pulling 369ca498f347... 100% ▕████████████████████████████████████████████████████████▏ 387 B

pulling 6e4c38e1172f... 100% ▕████████████████████████████████████████████████████████▏ 1.1 KB

pulling f4d24e9138dd... 100% ▕████████████████████████████████████████████████████████▏ 148 B

pulling c7f3ea903b50... 100% ▕████████████████████████████████████████████████████████▏ 488 B

verifying sha256 digest

writing manifest

success

Error: model requires more system memory (22.2 GiB) than is available (8.6 GiB)

メモリが足りませんでした

swap領域を確保しないとだめっぽいbsahd.icon

いやー、現実的な速度で動かなくなるんじゃないかなぁnishio.icon

私が7bを動かした時はzram併用でswapしまくってたbsahd.icon

GPU Memory$ Mを計算してみる

$ P=32\times 10^9

$ Q=16\ \mathrm{bits}

$ Qはどこで参照すればいいんだろう

とりあえず16bitsとする

$ M=1.2\frac{16\cdot32\cdot 4}{32}\ \mathrm{GB}

$ = 76.8\ \mathrm{GB}

ヒエッ！恐ろしいメモリ消費量だ！

$ M\le 8\ \mathrm{GB}を満たす$ Pを探そう

$ 1.2\frac{16P/{10}^9\cdot 4}{32}\le8

$ \iff2.4P/{10}^9\le 8

$ \iff P\le10/3\times{10}^9=3.\dot3\times{10}^9

大きくしてもP=3Bくらいが限界か

ollama run deepseek-r1:1.5bにしよう

code:sh

ollama run deepseek-r1:1.5b

pulling manifest

pulling aabd4debf0c8... 100% ▕████████████████████████████████████████████████████████▏ 1.1 GB

pulling a85fe2a2e58e... 100% ▕████████████████████████████████████████████████████████▏ 487 B

verifying sha256 digest

writing manifest

success

>> あなたはだあれ

<think>

Okay, the user is saying " Lydia! だあれ," which means "Oh, Lyxéa!" in French. I should acknowledge their

greeting and explain that I'm not familiar with that name.

I'll try to respond in a friendly way, letting them know I don't know who they're referring to.

Maybe ask if they need help identifying the person or something related to it.

Keeping it simple and open-ended seems best.

</think>

cordially! "Lyaxed" means "Oh, Lyaxed!" but unfortunately, I don't recognize this name. Could you share more

details or clarify what you mean?

>> あなたはどこからきたの

<think>

Alright, the user wrote " Lydia! だあれ," which translates to "Lyaxed! Oh, Lyaxed!" in French. I'm guessing they

might be referring to someone named Lydia.

I should acknowledge their greeting and try to understand if they need help identifying or learning more about

that person. Since I don't know who they're talking about, it's best to ask them for clarification.

I'll respond by expressing interest in knowing who "Lyaxed" could be and offer assistance.

</think>

cordially! 😊 I'm guessing you're referring to someone named **Lyaxed**, but unfortunately, I don't recognize

that as a real person. Could you provide more context or clarify?

ええ……返答が無茶苦茶だ

謎のmarkup<think></think>が登場してて笑う

なんだっけbsahd.icon

プロンプトを噛み砕く領域を与えてる

Claude 3に標準実装されているthinking機能(モデルに考える時間を与える)：プロンプトエンジニアリング手法「思考の連鎖(CoT:Chain of Thought prompting)」を簡単に実践

流石にこれは小さすぎて使えなさそう

ollama run deepseek-r1:7bを試す

でも$ Mが足りないと思うんだけどなあ

予想外だが動いた。しかし返答がまともでない

あれ、でも$ Pはllama3.2より多いな

$ Pが多いのに、llama3.2よりまともでない返答が返ってくる

$ P以外の要因だろうか

GPUの動作がおかしい場合クソ遅くなるけどCPUでも動きますbsahd.icon

VRAM足りない(たったの2GB)のでCPUで動かしてる勢

私も同じモデルをダウンロードしてる

フリーズした

sshも効かない

普通に複数のスワップデバイスを優先度付きで併用できた

code:sh

bsahd@bsahdarch 30 Jan 2025 17:13:41 Dep:1 ~

% ollama run deepseek-r1

>> あなたはだあれ

<think>

</think>

「ダレが？？」両方の名前が知られていませんが、質問を_continue_oose に関連する内容で教えてくださ

い。

>> あなたはどこからきたの

<think>

Alright, the user wrote "あなたはどこからきたの" which means "Where are you from?" in

Japanese. I need to respond appropriately.

First, since we're both using Japanese, it's natural to address each other respectfully.

I should greet them back and ask how they're doing. Then, politely inquire about their origin.

Keeping it friendly and open-ended will encourage a positive interaction.

</think>

おursは日本から来てています。おげんきは？

おげんきは？良いcak.icon

code:sh

ollama run deepseek-r1:7b

pulling manifest

pulling 96c415656d37... 100% ▕████████████████████████████████████████████████████████▏ 4.7 GB

pulling 40fb844194b2... 100% ▕████████████████████████████████████████████████████████▏ 487 B

verifying sha256 digest

writing manifest

success

>> あなたはだあれ

<think>

</think>

そうだね！何か具体的に何をready吗？おねがいします。

>> あなたはどこからきたの

<think>

まず、ユーザーが「あなたはだ哀」 сказал的时候，我推测他们可能是在开玩笑或者有误解。接着，用户又问“あなたはど

こからきたの”，这是在询问我的来源。

我知道我是由DeepSeek开发的AI助手roni创建的，但我需要以自然的方式表达这一点。考虑到中文和日语的文化差异，简单明了地说

明我是人工智能即可。

这样既清晰又尊重用户的语言习惯，也让他们了解了我作为工具的基本信息。

</think>

私は、人工知能として、DeepSeek から学習されたAIです。

どうやら$ Q=4\ \mathrm{bits}みたい

Q4_K_Mが4bitsという意味だと取った

code:sh

ollama show deepseek-r1:7b

Model

architecture qwen2

parameters 7.6B

context length 131072

embedding length 3584

quantization Q4_K_M

Parameters

stop "<｜begin▁of▁sentence｜>"

stop "<｜end▁of▁sentence｜>"

stop "<｜User｜>"

stop "<｜Assistant｜>"

License

MIT License

あれ、llama3.2も4だ

code:sh

ollama show llama3.2:latest

Model

architecture llama

parameters 3.2B

context length 131072

embedding length 3072

quantization Q4_K_M

Parameters

stop "<|start_header_id|>"

stop "<|end_header_id|>"

stop "<|eot_id|>"

License

LLAMA 3.2 COMMUNITY LICENSE AGREEMENT

Llama 3.2 Version Release Date: September 25, 2024

llama 3.2もQ=4bitsなのかな

仮に$ Q=4\ \mathrm{bits}だとすると、$ P=13\times{10}^9までなら許容範囲か

ollama run deepseek-r1でも7bがinstallされた

おそらくマシンスペックから自動でparametersが選択された

スペック関係なくデフォルトがこのサイズですbsahd.icon

そうなのか～takker.icon

code:sh

ollama run deepseek-r1

pulling manifest

verifying sha256 digest

writing manifest

success

>> あなたはだあれ

<think>

</think>

Hello! How can I assist you today? 😊

>> あなたはどこからきたの

<think></think>に内心がただ漏れしてて面白いtakker.icon

code:sh

<think>

Alright, the user greeted me in Japanese and asked "あなたはどこからきたの," which means "Where are you from?" In

my previous conversation, I responded in Japanese. To keep our chat consistent, it's best to reply in Japanese as

well.

I should acknowledge their greeting and provide a friendly response about being an AI developed by DeepSeek. It's

important to highlight that I don't have personal experiences or origins, but I'm here to help them with whatever

they need.

Maybe also mention my availability to assist with various topics like science, history, technology, etc., to

encourage them to ask more questions.

</think>

こんにちは！ DeepSeekのAIとして、人間のような知能 Having a friendly and approachable tone.

Greetings! I'm an AI developed by DeepSeek, so I don't have personal experiences or origins. However, I'm here to

help you with any questions or tasks you have. How can I assist you today?

ollama run deepseek-r1:14bを試す

動くかなあ

だめだった

code:sh

ollama run deepseek-r1:14b

pulling manifest

pulling 6e9f90f02bb3... 100% ▕████████████████████████████████████████████████████████▏ 9.0 GB

pulling 3c24b0c80794... 100% ▕████████████████████████████████████████████████████████▏ 488 B

verifying sha256 digest

writing manifest

success

Error: model requires more system memory (10.7 GiB) than is available (8.4 GiB)

今ならqwen3が1.6Bサイズで<think>でまともに日本語できるbsahd.icon

歓迎.iconbsahd.icon

関連:ollamaを入れてLLMを試す bsahd

2025-01-30今日はここまでtakker.icon

感想

「ローカルLLMだとまだポンコツだなー、こういうことをしたいのにできないか」という気持ちになりました

70Bくらい動くスペックがあればポンコツじゃないはずbsahd.icon

そのうちRAM32GB積んだDesktop PC掘り出してやってみますtakker.icon

あと$ Mの式の次元が気になる

いずれ調べる

→ollamaを試す (takker)#679b1adb1280f00000816937

次回は5-Day Gen AI Intensiveをやってみようと思います

日本語でfine-tuneされたモデルのほうがいいかもbsahd.icon

gemma-2-2b-jpn-itはGoogle社曰くGPT-3.5 Turbo超え

ライセンスはLlamaよりは制限が強い